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DETAILED ACTION 

Claim Objections 

1 . Claims 20-22 are objected to because of tlie following informalities: 

Claims 20-22 recite the limitation "the frames". There is insufficient antecedent 
basis for this term, because each of claims 20-22 depends from claim 17 (which does 
not mention "frames"). 

Appropriate correction is required. 

Claim Rejections • 35 (JSC §112 

2. The following is a quotation of the first paragraph of 35 U.S.C. 1 1 2: 

The specification shall contain a written description of the invention, and of the manner and process of 
mal<ing and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

3. Claim 1 1 is rejected under 35 U.S.C. 112, first paragraph, as failing to comply 
with the enablement requirement. The claim(s) contains subject matter which was not 
described in the specification in such a way as to enable one skilled in the art to which it 
pertains, or with which it is most nearly connected, to make and/or use the invention. 
Claim 1 1 requires that segments in a database are coded by LPC, GSM coding, or 
"other coding schemes". The term "other coding schemes" is not enabled by the 
specification, because "other coding schemes" may include any coding scheme 
available. Clearly, the specification does not provide an adequate description of every 
available coding scheme known. Thus, "other coding schemes" is not enabled by the 
specification. 
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4. The following is a quotation of the second paragraph of 35 U.S.C. 1 12: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

5. Claims 2, 1 1 , and 21 are rejected under 35 U.S.C. 112, second paragraph, as 
being indefinite for failing to particularly point out and distinctly claim the subject matter 
which applicant regards as the invention. 

Claim 2 is directed to an output waveform synthesizer that is "essentially the 
same as the synthesizer used in a conventional parametric synthesizer". The term 
"essentially the same" renders the claim indefinite because it cannot be determined 
exactly what requirements must be met so that a particular wavefomn synthesizer is 
"essentially the same" as a conventional waveform synthesizer. 

Claim 1 1 requires that segments in a database are coded by LPC, GSM coding, 
or "other coding schemes". The term "other coding schemes" renders the claim 
Indefinite because it cannot be determined what other coding schemes are included by 
this term. 

Regarding claim 21 , the phrase "e.g." (meaning "for example") renders the claim 
Indefinite because it is unclear whether the llmitatlon(s) following the phrase are part of 
the claimed invention. See MPEP § 2173.05(d). 
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Claim Rejections - 35 USC § 102 

6. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

7. Claims 1-5, 12, and 17-20 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Page et al. (U.S. Patent 6,175,821). 

In regard to claim 1 , Page et al. disclose a speech synthesizer having an output 
stage (Fig. 1 , elements 3, 4, and 6) for converting a phonetic description to an acoustic 
output, the output stage including a database of recorded utterance segments (recorded 
speech 3), in which the output stage: 

a. converts the phonetic description to a plurality of time-varying parameters (text 
to speech synthesizer 4 converts a series of diphones into parameters indicating 
phoneme boundaries and pitchmarks, column 4, line 62 to column 5, line 21); 

b. interprets the parameters as a key for accessing the database to identify an 
utterance segment in the database (once the speech synthesizer is completed, 
message generator 6 accesses recorded speech database 3 to retrieve portions of the 
message, column 5, lines 36-45), and 

c. outputs the identified utterance segment (the retrieved recorded segments and 
synthesized speech are concatenated and output, column 5, lines 45-47); 

in which the output stage further comprises an output waveform synthesizer that 
can generate an output signal from the parameters, whereby, in the event that the 
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parameters describe an utterance segment for which there is no corresponding 
recording in the database, the parameters are passed to the output wavefomri 
synthesizer to generate an output signal (variable portions of the message which are 
not available in the recorded speech database 3 are generated by the speech 
synthesizer 4 and included in the output signal, see Fig. 2 and column 5, line 65 to 
column 6, line 21). 

In regard to claim 2, Page et al. disclose the output waveform synthesizer is 
essentially the same as the synthesizer used in a conventional parametric synthesizer 
(text to speech converter 4 is a standard TTS, column 4, lines 5-8). 

In regard to claim 3, Page et al. disclose the database is populated to achieve a 
compromise between quality and memory requirement most appropriate to a specific 
application (since certain portions are often used, they are stored in recorded speech 
database 3 to make the recording sound as natural as possible, column 5, lines 58-61). 

In regard to claim 4, Page et al. disclose the database is populated with 
segments that are most likely to be required to generate a range of output 
corresponding to the application of the synthesizer (since certain portions are often 
used, they are stored in recorded speech database 3 to make the recording sound as 
natural as possible, column 5, lines 58-61). 
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In regard to claim 5, Page et al. disclose the database Is populated with 
utterance segments derived from speech by a particular individual speaker (the same 
speaker, column 2, lines 23-24). 

In regard to claim 12. Page et al. disclose the parameters are generated in 
regular periodic frames (see Figs. 5A-5C, regular pitch marks 79-91). 

In regard to claim 17, Page et al. disclose a method of synthesizing speech 
comprising: 

a. generating from a phonetic description a plurality of time-varying parameters 
that describe an output waveform (text to speech synthesizer 4 converts a series of 
diphones into parameters indicating phoneme boundaries and pitchmarks, column 4, 
line 62 to column 5, line 21 ); 

b. interpreting the parameters to identify an utterance segment within a database 
of such segments that corresponds to the audio output defined by the parameters and 
retrieving the segment to create an output waveform (once the speech synthesizer is 
completed, message generator 6 accesses recorded speech database 3 to retrieve 
portions of the message, column 5, lines 36-45) ; and 

c. outputting the output waveform (the retrieved recorded segments and 
synthesized speech are concatenated and output, column 5, lines 45-47); 

in which, if no utterance segment is identified in the database in step b, as 
corresponding to the parameters, an output waveform for output in step c is generated 
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by synthesis (variable portions of the message which are not available in the recorded 
speech database 3 are generated by the speech synthesizer 4 and included in the 
output signal, see Fig. 2 and column 5, line 65 to column 6, line 21). 

In regard to claim 18, Page et al. disclose steps a to c are repeated in quick 
succession to create an impression of a continuous output (the retrieved recorded 
segments and the synthesized speech are concatenated to form a continuous output, 
see Fig. 2 and column 5, lines 45-47). 

In regard to claim 19, Page et al. disclose the parameters are generated in 
discrete frames, and steps a to c arc performed once for each frame (see Figs. 5A-5C, 
regular pitch marks 79-91: these are generated for the entire message, column 4. line 
62 to column 5, line 21). 

In regard to claim 20, Page et al. disclose the frames are generated with a 
regular periodicity (see Figs. 5A-5C, regular pitch marks 79-91 ). 

Claim Rejections - 35 USC § 103 

8. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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9. Claim 6 is rejected under 35 U.S.C. 103(a) as being unpatentable over Page et 
a!., in view of Holm et al. (U.S. Patent 5,850,629) 

In regard to claim 6, Page et al. do not disclose the database is populated with 
utterance segments derived from speech by speakers of a particular gender. 

Holm et al. disclose a database for speech synthesis, wherein the database is 
populated with utterance segments derived from speech by speakers of a particular 
gender (male or female voice databases can be selected, column 9, lines 3-6). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Page et al. to populate the database with utterance segments 
derived from speech by speakers of a particular gender, because this would allow a 
user to select a voice sound that they preferred. 

10. Claim 7 is rejected under 35 U.S.C. 103(a) as being unpatentable over Page et 
al., in view of Nukaga et al. (U.S. Patent 7,113,909). 

In regard to claim 7, Nukaga et al. do not disclose the database is populated with 
utterance segments derived from speech by speakers having a particular accent. 

Nukaga et al. disclose a database for speech synthesis, wherein the database is 
populated with utterance segments derived from speech by speakers having a particular 
accent (a memory is populated with a specific speech style, column 5, lines 48-59; 
which includes a particular accent, column 6, lines 17-24). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Page et al. to populate the database with utterance segments 
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derived from speech by speakers having a particular accent, because providing a 
particular accent is advantageous in the internationalization of a device, as taught by 
Nukaga et al. (column 2, lines 37-41). 

11. Claims 8, 9, 16, 22, and 23 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Page et al., in view of Kamai et al. (U.S. Patent 5,864,812). 

In regard to claims 8 and 9, Page et al. do not disclose the database is indexed 
or that the index values for accessing the database are the values of the time-varying 
parameters. 

Kamai et al. disclose a speech synthesizer that comprises an indexed database 
(Fig. 24, F1 index and F2 index, column 15, lines 19-22), wherein time varying 
parameters converted from phonetic descriptions are used as index values for 
accessing the database (phoneme information is converted to parameters, column 16, 
lines 37-42 and lines 57-67; which are then used to access the speech segment 
database, column 17, lines 4-28). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Page et al. to index the database by values of the time varying 
parameters, because indexing the database by the time varying parameters allows the 
synthesized speech to be generated "moment to moment", thus reducing the size of the 
database, as taught by Kamai et al. (column 17, line 62 to column 18, line 9). 
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In regard to claims 16 and 22, Page et al. do not disclose the parameters 
correspond to speech formants. 

Kama! disclose the parameters correspond to speech formants (phoneme 
information is converted to formant information, column 16, lines 37-42 and lines 57-67; 
which are then used to access the speech segment database, column 17, lines 4-28). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Page et al. to derive formants as parameters for accessing the 
database, because indexing the database by formants allows the synthesized speech to 
be generated "moment to moment", thus reducing the size of the database, as taught by 
Kamai et al. (column 17, line 62 to column 18, line 9). 

In regard to claim 23, Page et al. do not disclose the output segments for any 
one frame are selected as a function of the parameters of several frames. 

Kamai disclose the output segments for any one frame are selected as a function 
of the parameters of several frames (the formant frequencies selected from the 
database depend on speech segment being currently synthesized and the type of 
consonant connected next, column 17, lines 29-33). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Page et al. to base the output for any one frame as a function of the 
parameters of several frames, because this would result in more natural sounding 
output. 
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12. Claims 10 and 11 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Page et al., in view of van Santen et al. (U.S. Patent 7.010,488). 

In regard to claims 10 and 1 1 , Page et al. do not disclose the segments within 
the database are coded using linear predictive coding, GSM coding, or other coding 
schemes. 

Van Santen et al. disclose a speech synthesizer comprising a speech segment 
database, wherein the segments within the database are coded using linear predictive 
coding, GSM coding, or other coding schemes (LPC. column 4, lines 9-12). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Page et al. to code the segments in the database using linear 
predictive coding, GSM coding, or other coding schemes, because this would reduce . 
the size of the database, as taught by van Santen et al. (column 2, lines 60-64). 

13. Claims 13. 14. 15, and 21 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Page et al., in view of Official Notice. 

In regard to claims 13, 14. and 21, Page et al. do not disclose the precise period 
length of the frames. 

Official Notice is taken that it is notoriously well-known in the art that frames are 
approximately 10 ms when working with speech signals, because this period length 
provides the best comprise between time domain resolution and frequency domain 
resolution for slowly varying speech parameters. 



Application/Control Number: 10/645,677 Page 12 

Art Unit: 2626 

It would have been obvious to one of ordinary skill in the art at the time of 
ihventlon to modify Page et al. so that the frames were approximately 10 ms, because 
this period length provides the best comprise between time domain resolution and 
frequency domain resolution for slowly varying speech parameters. 

In regard to claim 15, Page et al. disclose at each frame, an output waveform is 
generated these being reproduced in succession to create an impression of continuous 
output (the retrieved recorded segments and the synthesized speech are concatenated 
to form a continuous output, see Fig. 2 and column 5, lines 46-47). 

Conclusion 

14. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. Henton {Challenges and Rewards in Using Parametric or 
Concatenative Speech Synthesis) disclose the tradeoffs between parametric and 
concatenative speech synthesis. Cruickshank (U.S. Patent Application Publication 
2003/0158734) disclose a speech synthesizer that uses a traditional TTS converter 
when a word is out-of-vocabulary. Cecys (U.S. Patent 5,704,007) discloses using both 
recorded sources and generated sources for speech synthesis. Pechter et al. (U.S. 
Patent 6.879,957) disclose a speech synthesizer that, when a whole word is not in a 
database, generates the word by concatenating diphones from the database. 

1 5. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Brian L. Albertalli whose telephone number is (571) 272- 
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7616. The examiner can normally be reached on Mon - Fri, 8:00 AM - 5:30 PM, every 
second Fri off. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on (571) 272-7843. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Infomnation Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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